Search CORE

70 research outputs found

Maximum Likelihood Pitch Estimation Using Sinusoidal Modeling

Author: Mahadevan Vijay
Publication venue
Publication date: 01/01/2010
Field of study

The aim of the work presented in this thesis is to automatically extract the fundamental frequency of a periodic signal from noisy observations, a task commonly referred to as pitch estimation. An algorithm for optimal pitch estimation using a maximum likelihood formulation is presented. The speech waveform is modeled using sinusoidal basis functions that are harmonically tied together to explicitly capture the periodic structure of voiced speech. The problem of pitch estimation is casted as a model selection problem and the Akaike Information Criterion is used to estimate the pitch. The algorithm is compared with several existing pitch detection algorithms (PDAs) on a reference pitch database. The results indicate the superior performance of the algorithm in comparison with most of the PDAs. The application of parametric modeling in single channel speech segregation and the use of mel-frequency cepstral coefficients for sequential grouping are analyzed in the speech separation challenge database

Digital Repository at the University of Maryland

Nonlinearly consistent schemes for coupled problems in reactor analysis

Author: Mahadevan Vijay Subramaniam
Publication venue: Texas A&M University
Publication date: 25/04/2007
Field of study

Conventional coupling paradigms used nowadays to couple various physics components in reactor analysis problems can be inconsistent in their treatment of the nonlinear terms. This leads to usage of smaller time steps to maintain stability and accuracy requirements thereby increasing the computational time. These inconsistencies can be overcome using better approximations to the nonlinear operator in a time stepping strategy to regain the lost accuracy. This research aims at finding remedies that provide consistent coupling and time stepping strategies with good stability properties and higher orders of accuracy. Consistent coupling strategies, namely predictive and accelerated methods, were introduced for several reactor transient accident problems and the performance was analyzed for a 0-D and 1-D model. The results indicate that consistent approximations can be made to enhance the overall accuracy in conventional codes with such simple nonintrusive techniques. A detailed analysis of a monoblock coupling strategy using time adaptation was also implemented for several higher order Implicit Runge-Kutta (IRK) schemes. The conclusion from the results indicate that adaptive time stepping provided better accuracy and reliability in the solution fields than constant stepping methods even during discontinuities in the transients. Also, the computational and the total memory requirements for such schemes make them attractive alternatives to be used for conventional coupling codes

Texas A&M Repository

High Resolution Numerical Methods for Coupled Non-linear Multi-physics Simulations with Applications in Reactor Analysis

Author: Mahadevan Vijay Subramaniam
Publication venue
Publication date
Field of study

The modeling of nuclear reactors involves the solution of a multi-physics problem with widely varying time and length scales. This translates mathematically to solving a system of coupled, non-linear, and stiff partial differential equations (PDEs). Multi-physics applications possess the added complexity that most of the solution fields participate in various physics components, potentially yielding spatial and/or temporal coupling errors. This dissertation deals with the verification aspects associated with such a multi-physics code, i.e., the substantiation that the mathematical description of the multi-physics equations are solved correctly (both in time and space). Conventional paradigms used in reactor analysis problems employed to couple various physics components are often non-iterative and can be inconsistent in their treatment of the non-linear terms. This leads to the usage of smaller time steps to maintain stability and accuracy requirements, thereby increasing the overall computational time for simulation. The inconsistencies of these weakly coupled solution methods can be overcome using tighter coupling strategies and yield a better approximation to the coupled non-linear operator, by resolving the dominant spatial and temporal scales involved in the multi-physics simulation. A multi-physics framework, KARMA (K(c)ode for Analysis of Reactor and other Multi-physics Applications), is presented. KARMA uses tight coupling strategies for various physical models based on a Matrix-free Nonlinear-Krylov (MFNK) framework in order to attain high-order spatio-temporal accuracy for all solution fields in amenable wall clock times, for various test problems. The framework also utilizes traditional loosely coupled methods as lower-order solvers, which serve as efficient preconditioners for the tightly coupled solution. Since the software platform employs both lower and higher-order coupling strategies, it can easily be used to test and evaluate different coupling strategies and numerical methods and to compare their efficiency for problems of interest. Multi-physics code verification efforts pertaining to reactor applications are described and associated numerical results obtained using the developed multi-physics framework are provided. The versatility of numerical methods used here for coupled problems and feasibility of general non-linear solvers with appropriate physics-based preconditioners in the KARMA framework offer significantly efficient techniques to solve multi-physics problems in reactor analysis

Texas A&M Repository

Multiple-Question Multiple-Answer Text-VQA

Author: Appalaraju Srikar
Mahadevan Vijay
Manmatha R.
Tang Peng
Xie Yusheng
Publication venue
Publication date: 14/11/2023
Field of study

We present Multiple-Question Multiple-Answer (MQMA), a novel approach to do text-VQA in encoder-decoder transformer models. The text-VQA task requires a model to answer a question by understanding multi-modal content: text (typically from OCR) and an associated image. To the best of our knowledge, almost all previous approaches for text-VQA process a single question and its associated content to predict a single answer. In order to answer multiple questions from the same image, each question and content are fed into the model multiple times. In contrast, our proposed MQMA approach takes multiple questions and content as input at the encoder and predicts multiple answers at the decoder in an auto-regressive manner at the same time. We make several novel architectural modifications to standard encoder-decoder transformers to support MQMA. We also propose a novel MQMA denoising pre-training task which is designed to teach the model to align and delineate multiple questions and content with associated answers. MQMA pre-trained model achieves state-of-the-art results on multiple text-VQA datasets, each with strong baselines. Specifically, on OCR-VQA (+2.5%), TextVQA (+1.4%), ST-VQA (+0.6%), DocVQA (+1.1%) absolute improvements over the previous state-of-the-art approaches

arXiv.org e-Print Archive

Learning Optimal Seeds for Diffusion-Based Salient Object Detection

Author: Nuno Vasconcelos
Song Lu
Vijay Mahadevan
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/2014
Field of study

In diffusion-based saliency detection, an image is parti-tioned into superpixels and mapped to a graph, with super-pixels as nodes and edge strengths proportional to super-pixel similarity. Saliency information is then propagated over the graph using a diffusion process, whose equilibrium state yields the object saliency map. The optimal solution is the product of a propagation matrix and a saliency seed vector that contains a prior saliency assessment. This is obtained from either a bottom-up saliency detector or some heuristics. In this work, we propose a method to learn op-timal seeds for object saliency. Two types of features are computed per superpixel: the bottom-up saliency of the su-perpixel region and a set of mid-level vision features infor-mative of how likely the superpixel is to belong to an object. The combination of features that best discriminates between object and background saliency is then learned, using a large-margin formulation of the discriminant saliency prin-ciple. The propagation of the resulting saliency seeds, using a diffusion process, is finally shown to outperform the state of the art on a number of salient object detection datasets. 1

CiteSeerX

Crossref

DEED: Dynamic Early Exit on Decoder for Accelerating Encoder-Decoder Transformer Models

Author: Appalaraju Srikar
Li Tian
Mahadevan Vijay
Manmatha R.
Tang Peng
Zhu Pengkai
Publication venue
Publication date: 14/11/2023
Field of study

Encoder-decoder transformer models have achieved great success on various vision-language (VL) tasks, but they suffer from high inference latency. Typically, the decoder takes up most of the latency because of the auto-regressive decoding. To accelerate the inference, we propose an approach of performing Dynamic Early Exit on Decoder (DEED). We build a multi-exit encoder-decoder transformer model which is trained with deep supervision so that each of its decoder layers is capable of generating plausible predictions. In addition, we leverage simple yet practical techniques, including shared generation head and adaptation modules, to keep accuracy when exiting at shallow decoder layers. Based on the multi-exit model, we perform step-level dynamic early exit during inference, where the model may decide to use fewer decoder layers based on its confidence of the current layer at each individual decoding step. Considering different number of decoder layers may be used at different decoding steps, we compute deeper-layer decoder features of previous decoding steps just-in-time, which ensures the features from different decoding steps are semantically aligned. We evaluate our approach with two state-of-the-art encoder-decoder transformer models on various VL tasks. We show our approach can reduce overall inference latency by 30%-60% with comparable or even higher accuracy compared to baselines

arXiv.org e-Print Archive